The proposal is motivated by our efforts to look at data quality of various tables. And a free text description of several domains and the type of dataset (general population vs. something special) would be a good addition to the CDM.
@Ajit_Londhe and I would like to participate in this discussion. I think the second description listed on the Wiki is outdated from what we have implemented on our side. @Ajit_Londhe even developed a ACHILLE report exposing the domain notes from this data we are testing out.
However even since implementing this beta idea on our side @Ajit_Londhe and I have had other ideas about generalizing the table further for storage of other Metadata about the CDM (e.g. such as CDM run times).
@Ajit_Londhe and I will meet up next week to do a better job of documenting our ideas. We are open to other thoughts and input.
@Ajit_Londhe and I drafted our ideas in the Wiki. We wanted to pass them by Vojtech first since he has been thinking about this the longest. Then I figure we could send out the notes to all of you and discuss via email or set up a meeting.
Long way of saying . . . we are still thinking about this but getting some thoughts together first - I think we’ll have a more productive discussion with something tangible to provide feedback on.
I like the new update. For others puzzled by what concept_id there are to use in the METADATA table, do advanced search in Atlas (under vocabulary) like this (pick OMOP Domain)
Timely. Just back from metadata conference where one “next-step” subject was alignment of post-11179 metadata standards. Also, we did some metadata modeling for ONC pilot - not super proud if it, but a start.
I’m on fumes as bandwidth goes, but perhaps we can get someone from Columbia and Eric+Josh to participate in metadata discussion here. I opened the topic last week w/ PMI.
@ericaVoss, can you please add to the proposal two more examples for what values could go into the column METADATA_TYPE_CONCEPT_ID.
Also, if a concept exist (e.g., visit type) - do we expect one row per METADATA_CONCEPT_ID or people can still submit multiple name-value pairs. (and one of the names will equal the concept name).
If METADATA_CONCEPT_ID does not exist for a metadata entity (e.g., death table was using using 2014-June state death certificate data) - do we populate METADATA_CONCEPT_ID with concept of 0?
I wasked asked in email (@schillil) about ability to attach metadata to columns. The current proposal provides a shell and “Athena terminology” provides concepts. So to comment on column, one can use concepts for that. E.g., visit type.
See the highlighted concept below. (there are 54 concepts at the moment to pick from) (but we anticipate additional concepts created per requests of “metadata documenters”). Again, we want to give a generic tool for metadata that is “extensible” as we need.
The key is not to confuse metadata with data characterization as done by Achilles. (achilles_results table). An ETL or data warehouse insider knows a lot about a warehouse and the point of metadata is to put some of this “insider” knowledge into metadata - so that a user (or analyst) can get quickly “semi-intimate” with the data by just reading some some smart and organized notes made by the insider.
Perhaps we can propose a shell and let the community decide how to best use this shell to put some useful metadata content into it and in phase2 made metadata tighter and better. The perfect should not be enemy of the good in this phase 1.
Perhaps every WG member can provide examples of metadata that they would like to capture (and post here).
Mine would be:
dataset is updated once a year (or monthly or …)
dataset reflects only data from clinical trial (not routine care)
Achilles is executed after each data refresh. achilles_results are always available
dataset has drug order data as well as pharmacy dispensation data (can study ‘patient did not fill his prescription’ questions)
PHR data is present (in OBSERVATION table) but not mapped to any standard concepts and there are no plans to do this mapping
procedure data are in local codes only (any phenotype using procedures has to tweak the standard code in the phenotype to the local code)
dataset has EHR data only (and no claims data; site has no affiliated health plan)
dataset has claims data with “sparse lab data” (e.g., all that come from an accessible source, such as LabCorp). Such data does not reflect all lab data. Inpatient lab results are not present. (not available)
UPDATE: after CDM WG April meeting - the proposal was updated with phase 1 and phase 2 scope and use cases were updated.
Perhaps every WG member can provide examples of metadata that they would like to capture
Examples from OHSI github are:
A search for CDM_DOMAIN_META on github shows many other examples.
It seems like the table has 2 columns (“name value pairs”). Can someone post the DDL for the CDM_DOMAIN_META table (or where it can be found) @ericaVoss ?
The examples above again indicate the mode of “how to get semi-intimate” with a dataset by reading some smart metadata notes done by the dataset custodian.