We’ve discussed some pretty good ideas around metadata and annotations over the past few months, and have tried some site-specific proof of concepts, but haven’t made real progress so far on a community effort. One of my main goals for 2018 is to push this forward.
I presented a poster at the last Symposium with @ericaVoss and @Vojtech_Huser on the value of metadata and annotations to preventing poor study design. This poster focused on data source specific artifacts: data collection anomalies, vocabulary mapping changes, and Achilles Heel annotations. I also demonstrated a version of Atlas/WebAPI that could consume metadata and annotations in order to help users steer clear of danger with their study design.
After posting a thread in the Forums ( Annotations in the CDM ), @jon_duke presented a great use case for clinical annotation in an OHDSI community call, and demonstrated a web app used at GA Tech that relies upon a custom schema (Jon – can you share the ER diagram?). This dialogue showed that we have other valuable use cases to consider, but that we could conceivably unify the various levels of metadata into one construct.
From that call, 2 key questions came up:
1. Couldn’t fact_relationship handle most of our metadata needs?
@hripcsa raised this question, and it’s a good one. Could we store metadata in fact_relationship rather than create new table(s)? It’s certainly possible to add facts about patients or concepts, but it seems a bit too rigid as it contains purely numeric fields that do not allow human-authored annotation strings or temporal bounds. It feels like it would be challenging to force fit some of the types of information we’re looking to store, such as our classic examples of a drop-off in death data in 2011, or an unmappable clinical note about a patient made retroactively.
Additionally, metadata and annotations are data not collected during the patient’s observation period. As such, I would be hesitant to store it in the CDM schema, which is intended to “include all observational health data elements (experiences of the patient receiving health care)”; instead, I feel it would be better served in the results schema, or, perhaps better still, its own metadata schema.
2. Okay, assuming we don’t store this in the CDM…should we form a WG? Or should this be a part of the CDM WG?
Personally, I think we should form a new WG. Metadata and annotations are valuable artifacts at every level of a data source, but agreeing on standards for generating and storing it will be challenging. We will need to capture a diversity of use cases and understand how sites could realistically benefit from this new metadata to ensure that we’re not just creating more data for the sake of it. The CDM WG has a lot of key decisions to make that require significant discussion, and I worry that adding metadata and annotations to the mix will be detrimental to both efforts.
If there are no objections to creating a new WG, I volunteer to lead it. I would ask that we establish a GitHub repo to propose, vote, and disseminate our Metadata/Annotation standards similar to how @clairblacketer has done with the CDM repo. Similar to Themis, once metadata/annotation standards are agreed upon by this WG, we would take them to the CDM WG for final ratification into the CDM specification.
So who’s interested in joining this journey about the journey?
Tagging some additional folks from previous posts (only 10 tags allowed? c’mon!): @mgurley @jenniferduryea @lilipeng @Gowtham_Rao @Evan_Minty