OHDSI Home | Forums | Wiki | Github

Metadata extension to CDM

I would like to announce a proposal to extend metadata in the CDM.
The proposal is described at the CDM workgroup website


The proposal is motivated by our efforts to look at data quality of various tables. And a free text description of several domains and the type of dataset (general population vs. something special) would be a good addition to the CDM.

Overview of all other proposals is also available here (the proposal above is listed as ‘metadata’)

@Ajit_Londhe and I would like to participate in this discussion. I think the second description listed on the Wiki is outdated from what we have implemented on our side. @Ajit_Londhe even developed a ACHILLE report exposing the domain notes from this data we are testing out.

However even since implementing this beta idea on our side @Ajit_Londhe and I have had other ideas about generalizing the table further for storage of other Metadata about the CDM (e.g. such as CDM run times).

@Ajit_Londhe and I will meet up next week to do a better job of documenting our ideas. We are open to other thoughts and input.

1 Like


Please add the link to this Forum to hte Wiki page proposal, so people can find it.

Also, can you invite Dino? He has a lot of ideas and needs for metadata.

@ericaVoss yes, I’d be interested in participating as well.

1 Like

@Christian_Reich - done, @Vojtech_Huser beat me to it. :smile:

Interested as well.

1 Like

please sign me up for this as well. Thanks

@Vojtech_Huser, @dgambone, @t_abdul_basser, & @gregk,

@Ajit_Londhe and I drafted our ideas in the Wiki. We wanted to pass them by Vojtech first since he has been thinking about this the longest. Then I figure we could send out the notes to all of you and discuss via email or set up a meeting.

Long way of saying . . . we are still thinking about this but getting some thoughts together first - I think we’ll have a more productive discussion with something tangible to provide feedback on.

I like the new update. For others puzzled by what concept_id there are to use in the METADATA table, do advanced search in Atlas (under vocabulary) like this (pick OMOP Domain)

Finally got my act together here is the MetaData Doodle for when we could meet to discuss:

@Vojtech_Huser, @Ajit_Londhe, @dgambone, @t_abdul_basser, & @gregk please let me know which date/times work for you.

Looks like 10/26 @ 3:00PM.

@gregk can you PM me your email so I can send the meeting invite.

Does anyone have @gregk’s email?

Hope to see everyone at 3PM EST tomorrow! If you don’t have the meeting invite on your calendar let me know!

1 Like

Timely. Just back from metadata conference where one “next-step” subject was alignment of post-11179 metadata standards. Also, we did some metadata modeling for ONC pilot - not super proud if it, but a start.

I’m on fumes as bandwidth goes, but perhaps we can get someone from Columbia and Eric+Josh to participate in metadata discussion here. I opened the topic last week w/ PMI.

Daniella, email me who you want me to invite and I will.

@ericaVoss, can you please add to the proposal two more examples for what values could go into the column METADATA_TYPE_CONCEPT_ID.

Also, if a concept exist (e.g., visit type) - do we expect one row per METADATA_CONCEPT_ID or people can still submit multiple name-value pairs. (and one of the names will equal the concept name).

If METADATA_CONCEPT_ID does not exist for a metadata entity (e.g., death table was using using 2014-June state death certificate data) - do we populate METADATA_CONCEPT_ID with concept of 0?

Per today’s meeting I added all the domains to the examples. Vojtech will prepare to propose at the next CDM team meeting.

I wasked asked in email (@schillil) about ability to attach metadata to columns. The current proposal provides a shell and “Athena terminology” provides concepts. So to comment on column, one can use concepts for that. E.g., visit type.

See the highlighted concept below. (there are 54 concepts at the moment to pick from) (but we anticipate additional concepts created per requests of “metadata documenters”). Again, we want to give a generic tool for metadata that is “extensible” as we need.

Daniela mentioned today other metadata literature "Metadata as DDI or ISO-11179. Also, HL7 standards for metadata.

Rimma proposed to make make metadata table like observation table (value_as_string, value_as_concept_id)

Should the scope be computable metadata or metadata for humans to read about.

I would like to continue the metadata discussion at the upcoming CDM WG call.

I created a modified table proposal that possibly addresses some of the points raised during the last discussion

The key is not to confuse metadata with data characterization as done by Achilles. (achilles_results table). An ETL or data warehouse insider knows a lot about a warehouse and the point of metadata is to put some of this “insider” knowledge into metadata - so that a user (or analyst) can get quickly “semi-intimate” with the data by just reading some some smart and organized notes made by the insider.

Perhaps we can propose a shell and let the community decide how to best use this shell to put some useful metadata content into it and in phase2 made metadata tighter and better. The perfect should not be enemy of the good in this phase 1.

Perhaps every WG member can provide examples of metadata that they would like to capture (and post here).

Mine would be:

  • dataset is updated once a year (or monthly or …)
  • dataset reflects only data from clinical trial (not routine care)
  • Achilles is executed after each data refresh. achilles_results are always available
  • dataset has drug order data as well as pharmacy dispensation data (can study ‘patient did not fill his prescription’ questions)
  • weight data comes from Health Risk Assessment done by health plan (not from EHR)
  • PHR data is present (in OBSERVATION table) but not mapped to any standard concepts and there are no plans to do this mapping
  • procedure data are in local codes only (any phenotype using procedures has to tweak the standard code in the phenotype to the local code)
  • dataset has EHR data only (and no claims data; site has no affiliated health plan)
  • dataset has claims data with “sparse lab data” (e.g., all that come from an accessible source, such as LabCorp). Such data does not reflect all lab data. Inpatient lab results are not present. (not available)

UPDATE: after CDM WG April meeting - the proposal was updated with phase 1 and phase 2 scope and use cases were updated.

1 Like

To continue the discussion, I will try to tag folks to contribute example metadata they see as important.

@Christian_Reich @ericaVoss @Sigfried_Gold @rimma
This was my original input seeking post:

Perhaps every WG member can provide examples of metadata that they would like to capture

Examples from OHSI github are:

A search for CDM_DOMAIN_META on github shows many other examples.

It seems like the table has 2 columns (“name value pairs”). Can someone post the DDL for the CDM_DOMAIN_META table (or where it can be found) @ericaVoss ?

The examples above again indicate the mode of “how to get semi-intimate” with a dataset by reading some smart metadata notes done by the dataset custodian.